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bJO Abstract 

^ This paper introduces a new sweeping preconditioner for the iterative solution of 

^^ the variable coefficient Helmholtz equation in two and three dimensions. The algo- 

CN| rithms follow the general structure of constructing an approximate LDL^ factorization 

^^ by eliminating the unknowns layer by layer starting from an absorbing layer or bound- 

ary condition. The central idea of this paper is to approximate the Schur complement 
matrices of the factorization using moving perfectly matched layers (PMLs) introduced 
in the interior of the domain. Applying each Schur complement matrix is equivalent 
^ r^ to solving a quasi- ID problem with a banded LU factorization in the 2D case and to 

^ solving a quasi-2D problem with a multifrontal method in the 3D case. The resulting 

preconditioner has linear application cost and the preconditioned iterative solver con- 
verges in a number of iterations that is essentially indefinite of the number of unknowns 
or the frequency. Numerical results are presented in both two and three dimensions to 
vN demonstrate the efficiency of this new preconditioner. 

^I^ Keyw^ords. Helmholtz equation, perfectly matched layers, high frequency waves, pre- 

^ conditioners, LDL^ factorization. Green's functions, multifrontal methods, optimal order- 
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1 Introduction 

This is the second of a series of papers on developing efficient preconditioners for the 
;_j numerical solutions of the Helmholtz equation in two and three dimensions. To be specific, 

^ let the domain of interest be the unit box D — (0, 1)^ with d = 2, 3. The time-independent 

wave field u{x) ior x ^ D satisfies the following Helmholtz equation, 

Au(x) + -7r-—u(x) = fix), 

where cj is the angular frequency, c{x) is the velocity field and, f{x) is the external force. 
Commonly used boundary conditions are the approximations of the Sommerfeld condition 
which guarantees that the wave field generated by f{x) propagates out of the domain and 
other boundary condition for part of the boundary can also be considered. By appropriately 
rescaling the system, it is convenient to assume that the mean of c{x) is around 1. Then 
^ is the (average) wave number of this problem and A = ^ is the (typical) wavelength. 

Equations of the Helmholtz type appear commonly in acoustics, elasticity, electromag- 
netics, geophysics, and quantum mechanics. Efficient and accurate numerical solution of 



the Helmholtz equation is a very important problem in current numerical mathematics. 
This is, however, a very difficult computational task due to two main reasons. First, in a 
typical setting, the Helmholtz equation is discretized with at least a constant number of 
points per wavelength. Therefore, the number of samples n in each dimension is propor- 
tional to cj, the total number of samples N \s n^ — 0{uj^)^ and the approximating discrete 
system of the Helmholtz equation is an 0{(jJ^) x 0{(jJ^) linear system, which is extremely 
large in many practical high frequency simulations. Second, since the discrete system is 
highly indefinite and has a very oscillatory Green's function due to the wave nature of the 
Helmholtz equation, most direct and iterative solvers developed based on the multiscale 
paradigm are no longer efficient anymore. For further remarks, see the discussion in [12j. 

1.1 Approach and contribution 

In the previous paper [12j, we introduced a sweeping preconditioner that constructs an ap- 
proximate LDL^ factorization layer by layer starting from an absorbing layer. An important 
observation regarding the sweeping preconditioner is that the intermediate Schur comple- 
ment matrices of the LDL^ factorization corresponds to the restriction of the half-space 
Green's function of the Helmholtz equation to a single layer. In [12j, we represented the 
intermediate Schur complement matrices of the factorization efficiently in the hierarchical 
matrix framework [16]. In 2D, the efficiency of this preconditioner is supported by analysis, 
has linear complexity, and results very small number of iterations when combined with the 
GMRES solver. In 3D, however, the theoretical justification is lacking and constructing the 
preconditioner can be more costly. 

In this paper, we propose a new sweeping preconditioner that works well in both two 
and three dimensions. The central idea of this new approach is to represent these Schur 
complement matrices in terms of moving perfectly matched layers introduced in the interior 
of the domain. Applying these Schur complement matrices then corresponds to inverting 
a discrete Helmholtz system of a moving PML. Since each moving PML is only of a few 
grids wide, fast direct algorithms can be leveraged for this task. In 2D, this discrete 
system of the moving PML layer is a quasi- ID problem and can be solved efficiently using 
a banded LU factorization in an appropriate ordering. The construction and application 
costs of the preconditioner are O(n^) = 0{N) and 0{n^) = 0{N)^ respectively. In 3D, 
the discrete Helmholtz system of the moving PML is a quasi-2D problem and can be 
solved efficiently using the multifrontal methods. The construction and application costs 
of the preconditioner are O(n^) = 0{N^''^) and O(n^logn) = O(A^logA^), respectively. 
Numerical results show that in both 2D and 3D this new sweeping preconditioner gives rise 
to iteration numbers that is essentially independent of N when combined with the GMRES 
solver. After the construction of the preconditioner, we thus have a linear solution method 
for the discrete Helmholtz system. 

1.2 Related work 

There has been a vast literature on developing efficient algorithms for the Helmholtz equa- 
tion. A partial hst of significant progresses includes [3llll|6l[8l|10llIIl|lllIinil23l[MJ. We 
refer to the review article [13j and our previous paper [T2] for detailed discussion. The brief 
discussion below is restricted to the ones that are closely related to the approach proposed 
in this paper. 

The most efficient direct methods for solving the discrete Helmholtz systems are the 
multifrontal methods or their pivoted versions [9l[T5l|22]. The multifrontal methods exploit 



the locality of the discrete operator and construct an LDL^ factorization based on a hier- 
archical partitioning of the domain. The cost of a multifrontal method depends strongly 
on the number of dimensions. For a 2D problem with N — v? unknowns, a multifrontal 
method takes 0(A^^/^) flops and 0(A^log A^) storage space. The prefactor is usually rather 
small, making the multifrontal methods effectively the default choice for most 2D Helmholtz 
problems. However, for a 3D problem with N — v? unknowns, a multifrontal method re- 
quires 0{N'^) flops and 0{N^/^) storage space, which can be very costly for large scale 3D 
problems. 

The approach proposed here essentially reduces the dimensions of the problem by work- 
ing with n subproblems with one dimension lower. In the 3D case, for each subproblem, it 
leverages the effectiveness of the 2D multifrontal methods by solving a quasi-2D problem. 
The price of this reduction is that we only end up with an approximate inverse. However, 
this approximate inverse is reasonably accurate and works very well as a preconditioner 
when combined with standard iterative solvers in all our variable coefficient test cases. 

1.3 Contents 

The rest of this paper is organized as follows. Section [2] presents the new sweeping pre- 
conditioner in the 2D case and Section [3] reports the 2D numerical results. We extend this 
approach to the 3D case in Section [4] and report the 3D numerical results in Section [5j 
Finally, Section [6] discusses some future directions of this work. 

2 Preconditioner in 2D 

We will first discuss the sweeping factorization in general and then introduce the moving 
PML. 

2.1 Discretization and sweeping factorization 

Recall that our computational domain in 2D is I? = (O?!)^- ^^ order to simplify the 
discussion, we assume that the Dirichlet zero boundary condition is used on the side X2 = 1 
while approximations to the Sommerfeld boundary condition is enforced on the other three 
sides. One standard way of incorporating the Sommerfeld boundary condition is to use the 
perfectly matched layer (PML) [5l[7l[l7]. Introduce 
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Here 77 is typically around one wavelength and C is an appropriate positive constant inde- 
pendent of (jj. The PML method replaces di with si(xi)di and 82 with S2{x2)d2, respec- 
tively. This effectively provides a damping layer of width 77 near the three sides with the 



Somnierfeld boundary condition. The resulting equation becomes 
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We assume that f{x) is supported inside [77, 1 — 77] x [77, 1] (away from the PML). Dividing 
the above equation by Si(xi)s2(x2) results 
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The main advantage of this equation is its symmetry. We discretize the domain [0, 1]^ with 
a Cartesian grid with spacing h = 1/(72+ 1). The number of points n in each dimension is 
proportional to the wave number cj since a constant number of points is required for each 
wavelength. The set of all interior points of this grid is denoted by 

'^ = {PiJ = (ihjh) :1 <ij <n} 



(see Figure [T] (left)) and the total number of grid points is A^ 
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Figure 1: Left: Discretization grid in 2D. Right: Sweeping order in 2D with the moving 
PML. The dotted grid indicates the part that has already been eliminated. 

We denote by Uij, fij, and Cij the values oiu(x), f{x), and c{x) at point pij = (ih^jh). 
The 5-point stencil finite difference method writes down the equation at points in V using 
central difference. The resulting equation at Xij = (ih^jh) is 
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(•••)Ui,i = /i,i (2) 



with Uifjf equal to zero for (i^j^) that violates 1 < z',/ < n. Here (• • • ) stands for the sum 
of the four coefficients that appear in the first line. We order both Uij and fij row by row 
starting from the first row j — 1 and define the vectors 

U = (i/i^i, iX2,l, . . . , Un^l, . . . , ^l,n, '^2,n, • • • , ^n,n) , 
/ ^ v/l,l5 /2,l5 • • • 5 /n,l5 • • • 5 /l,n5 /2,n5 • • • 5 Jn^n) • 



Denote the discrete system of ^ by Au = /. We further introduce a block version of it by 
defining Vm to be set of the indices in the 7n-th row 

and introducing 

'^771 ^ v'^l,?7Z5 '^2,7715 ' ' ' 5 '^n,?7zj ancl Jj^i ^ [jl^m^ 72, 7715 ' ' ' 5 Jn,m) ' 

Then 

Using these notations, the system Au = / takes the fohowing block tridiagonal form 

^2,1 ^2,2 ■•• ^i2 ^ /2 

■• ■• 4 1 : : 

^T-n— l,n 
\ An,n-1 An,n J V^^/ V-^^/ 

where A^^^ are tridiagonal and A^^^-i = ^In-i m ^^^ diagonal matrices. 

The sweeping factorization of the matrix A is essentially a block LDL^ factorization 
that eliminates the unknowns layer by layer, starting from the absorbing layer near X2 = 0. 
The result of this process is a factorization 
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where Si = Ai^i, Sm = Am^rn - ^m,m-iS'rn-i^m-i,m for m = 2, . . . , n, and Lk is given by 
Lk{Vk+i,Vk) = Ak+i^kS^^, Lk{Vi,Vi) = / (1 < i < n), and zero otherwise. 

This process is illustrated graphically in Figure [l] (right). Inverting this factorization for A 
gives the following formula for u: 



u ^ {L\)-' . . . {Li_,)-^ 
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Algorithmically, the construction of the sweeping factorization of A can be summarized as 
follows by introducing T^ = S^^. 

Algorithm 2.1. Construction of the sweeping factorization of A. 



Si = Ai,i andTi = Sf \ 
for m = 2, . . . , n do 

O777, ^ J\jji^jji ^m,m—l-^m—l^m—l,m clIKd Ijyi ^ o^^ . 

end for 



Since Sm and T^ are in general dense matrices of size n x n^ the cost of the construction 
algorithm is of order O(n^) = 0{N'^). The computation oi u = ^~^f is carried out in the 
following algorithm once the factorization is ready. 

Algorithm 2.2. Computation of u — A~^f using the sweeping factorization of A. 

1: for 771 = 1, ... ,n do 

2: UjYi ^ Jttt, 

3: end for 

4: for 771 = 1, . . . , n — 1 do 

5: '^771+1 ^ '^771+1 ^m-\-l^m\J- m'^m) 

6: end for 

7: for 771 = 1, ... ,77 do 

9: end for 

10: for 771 = 77 — 1, . . . , 1 do 

11: Ujji ^ Ujji 1 jjiyJ\jjijjij^\Ujyi-^\) 

12: end for 

Obviously the computations of Tr^Um in the second and the third loops only needs to be 
carried out once for each 771. We prefer to write the algorithm this way for the simplicity 
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of presentation. The cost of computing u with Algorithm 
which is about 0{N^''^) times more expensive compared to the multifrontal method. There- 
fore, these two algorithms themselves are not very useful. 



2.2 Moving PML 



In Algorithms |2.1| and |2.2| the dominant cost is the construction and the application of 
the matrices T^. In [12j . we emphasized the physical meaning of the Schur complement 
matrices T^ of the sweeping factorization. Consider only the top-left m x m block of the 
above factorization. 
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where the Lj^ matrices are redefined to their restriction to the top-left m x m blocks. The 
matrix on the left is in fact the discrete Helmholtz equation restricted to the half space below 
X2 — (771 + l)/i and with zero boundary condition on this line. Inverting the factorization 
Q gives 
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The matrix on the left side is an approximation of the discrete half-space Green's function 
of the Helmholtz operator with zero boundary condition. On the right side, due to the 
definition of the matrices Li, . . . , Lm-i, the (771, 77i)-th block of the product is exactly equal 
to S:^^. Therefore, 



^r 



^m^ approximates the discrete half-space Green function of the Helmholtz operator 
with zero boundary at X2 = (m + l)/i, restricted to the points on X2 = mh. 



Trr 



In the previous paper |T2], T^ is approximated using the hierarchical matrix framework. 
Due to the fact that the 3D Green's function, restricted to a plane, propagates oscillations 
in all directions, the theoretical justification of that method is lacking in 3D. Here, we try 
to approximate the matrix T^ in a different way. 

As an operator, T^ '• 9m ^ "^m maps an external force g^ loaded only on the 777.-th layer 
to the solution Vj^ restricted to the same layer. Though it is a map between quantities only 
defined on the 777.-th layer, the computation domain includes all first 777. layers with the PML 
padded near X2 = 0. However, since the force Qjn is only loaded on the 777.-th layer, there is 
no reason to keep the PML layer near X2 = if one can be satisfied with an approximation. 

The central idea is to push the PML from X2 = right next to X2 = mh. 

To make this precise, let us assume that the width 77 of the PML is an integer multiple of 
h and let b = r]/h be the number of grid points in PML layer in the transversal direction. 
Define 

,m/ N / 1 , ,(T2{X2 - {m - b)h)Y^ 

52 [X2) 
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and introduce an auxiliary problem on the domain Djn = [0, 1] x [(777. — b)h, (m + l)/i]: 
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This equation is discretized with the subgrid 
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of the original grid V and the resulting 677. x 677. discrete Helmholtz operator is denoted by 
Hm- Following the main idea mentioned above, the operator T^ '• Qm ^ "^m defined through 
Hm by 
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is an approximation to the matrix T^. Notice that applying Tm to an arbitrary vector g^ 
involves solving a linear system of matrix i7^, which comes from the local 5-point stencil 
on the narrow grid Gm that contains only b layers. Let us introduce a new ordering for Gm 



Pl,m-6+l,Pl,m-6+2, • • • ,Pl;. 



• Pn,m—b-\-l, Pn,m—b-\-2i • • • ,Pn 



that iterates through the X2 direction and denote the permutation matrix induced from 
this new ordering by Pm- Now the matrix PmHmP^ is a banded matrix with only 6—1 
lower diagonals and 6—1 upper diagonals. It is well known that the LU factorization 
LfnUm — PmHmPm ^^^ ^c Constructed efficiently. As a result, the application of T^ can 
be carried out rapidly. 

We call this approach the moving PML method, since these new PMLs do not exist in 
the original problem as they are only introduced in order to approximate T^ efficiently. 



In the above discussion, the moving PML is pushed right next to X2 — mh. However, 
in general we can place the moving PML at a location that is a few layers away from 
X2 = mh. The potential advantage of keeping a few extra layers as a buffer is that the 
resulting approximation T^ is more accurate. On the other hand, since there are more layers 
in the subgrid Qj^ for each tti, the computational cost grows accordingly. In our numerical 
tests, we observe that extra buffer layers provide little improvement on the approximation 
accuracy and hence the moving PML is indeed pushed right next to X2 = mh. 

The application of a PML right next to the layer to be eliminated corresponds to a 
PML or absorbing boundary condition next to a Dirichlet boundary condition. This has 
been used as an asymptotic technique for high frequency scattering under the name of on- 
surface radiation boundary condition (OSRBC) [21IIB3- The OSRBC is an approximation 
that is more accurate than physical optics but, of course, not as accurate as a full boundary 
integral formulation. 



2.3 Approximate inversion and preconditioner 



Let us incorporate the moving PML technique into Algorithms |2 . 1 1 and 2.2 The computa- 
tion at the first (b -\- 1) layers needs to be handled differently, since it does not make sense 
to introduce moving PML for these initial layers. Let us call the first b layers the front part 
and define 

UF = {u\,..., ulY and fF = (/i, •••Jl 

Then we can rewrite Au = / as 
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The construction of the approximate sweeping factorization of A takes the following steps. 
Notice that since T^ are approximated directly there is no need to compute Sm anymore. 

Algorithm 2.3. Construction of the approximate sweeping factorization of A with moving 
PML. 

1: Let Gf be the subgrid of the first b layers, Hf = Af^Fi ^ind Pf be the permutation 
induce by the new ordering (x2 first) of Gf- Construct the LU factorization LfUf = 
PfHfPf- This factorization implicitly defines fp : C^^ -^ C^^. 
2: for m = 6 + 1, . . . , n do 

3: Let Gm — {Pij^ 1 ^ '^ ^ '^, 771—6+1 < j < m}, Hjri be the discrete system of ^ on Gm^ 

and Pjn be the permutation induced by the new ordering of Gm- Construct the LU 

factorization Lr^Um — PmHmPm- This factorization implicitly defines T^ : C^ -^ C^. 

4: end for 

The cost of Algorithm [zs] is 0{b^n^) = 0{b^N). The computation oi u ^ A~^ f using the 

constructed sweeping factorization is summarized in the following algorithm 
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Algorithm 2.4. Computation of u 
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i/5+1 = '^6+1 — ^6+i,f(2V'^f)- TpUF is computed as PpUp L^ Ppup. 
for 77i = 6+l,...,n— Ido 

^m+i = ^m+1 - ^m+i,m(rm'^m)- The application of TmUm is done by forming the 

vector (0, . . . , 0, ul^Y^ applying P^U^^L^Pm to it, and extracting the value on the 

last layer. 
end for 

uf = Tpup. See the previous steps for the application of Tp. 
for 7Ti = 6+l,...,ndo 

Um — TjnUm. See the previous steps for the application of T^. 
end for 
for 7Ti = n — 1,...,6+1 do 

Um — Um — Tjn(Ajn^jn^iUm+i) • See the previous steps for the application of T^. 
end for 

Uf = Uf — TF{AF^b+iUb+i)' See the previous steps for the application of Tp. 
The cost of Algorithm 2.4 is 0{b^n^) = 0{b'^N). Since 6 is a fixed constant, the cost is 



essentially linear. Algorithm |2.4| defines an operator 



nJ ' 



which is an approximate inverse of the discrete Helmholtz operator A. Due to the indefi- 
niteness of A, this approximate inverse might suffer from instability. In practice, instead of 
generating the sweeping factorization of the original matrix A, we choose to generate the 
factorization for the matrix A^ associated with the modified Helmholtz equation 

Auix) + ^^^^±^u{x) = fix), (6) 

C [X) 

where a is an 0(1) positive constant. We denote by M^ : f ^ u the operator defined by 
Algorithm |2.4| with this modified equation. We would like to emphasize that mh is very 
different from the equation used in the shifted Laplacian approach (for example [Mt [19]): 
in the shifted Laplacian formulation the imaginary part of the operator is 0{uj) while here 
the imaginary part is 0(1). 

Since a is small, A^ is close to A. Therefore, we propose to solve the preconditioner 
system 

MaAu = Maf 

using the GMRES solver [25] |26]. As the cost of applying M^ to any vector is 0{in?) = 
0{N), the total cost of the iterative solver scales like O(NjN), where Nj is the number 
of iterations. As the numerical results in Section |3] demonstrate, Nj depends at most 
logarithmically on A^, thus resulting a solver with almost linear complexity. 

The problem considered so far has zero Dirichlet boundary condition on X2 = 1. A 
common situation is to impose PML at all sides. In this case, the algorithms need a slight 
modification. Instead of sweeping upward from X2 = 0, the algorithm sweeps with two 
fronts, one from X2 = upward and the other from X2 = 1 downward (see Figure^ (left)). 
Similar to uf and /f near X2 = 0, we introduce 

^^ = «-6+i. •••.<)' and fE^{fi_^,^^,...,fJ 
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Figure 2: Different sweeping patterns. Left: For problems with PML at both X2 = and 
X2 = 1, the algorithm sweeps from both ends towards the center. Right: Instead of one 
layer, multiple layers of unknowns can be eliminated within each iteration of the algorithm. 



and write Au = / in the following block form 
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The upward sweep goes through tti = F, 6+ 1, . . . , (n — l)/2, and the downward sweep visits 
m — E^n — h^ . . . ^{n ^ 3)/2. Finally, the algorithm visits the middle layer m — {n^ l)/2 
with moving PMLs on both sides. 

Algorithm |2.3| eliminates one layer of unknowns within each iteration. We can also 



instead eliminate several layers of unknowns together within each iteration (see Figure ^ 
(right)). The resulting algorithm spends more computational time within each elimination 
step, since the discrete system Hj^ contains more layers in the X2 dimension. On the other 
hand, the number of elimination steps goes down by a factor equal to the number of layers 
processed within each elimination step. In practice, the actual number d of layers processed 
within each step depends on the width of the moving PML and is chosen to minimize the 
overall computation time and storage. 



3 Numerical Results in 2D 

In this section, we present several numerical results to illustrate the properties of the sweep- 
ing preconditioner described in Section [2j The algorithms are implemented in Matlab and 
all tests are performed on a computer with a 2.6GHz CPU. We use GMRES as the iterative 
solver with relative residue tolerance equal to 10~^. 

3.1 PML 

The examples in this seciton have the PML boundary condition specified at all sides. We 
consider three velocity fields in the domain D — (0, 1)^: 

1. The first velocity field corresponds to a smooth converging lens with a Gaussian profile 
at the center of the domain (see Figure [s]^ a)). 
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2. The second velocity field is a vertical waveguide with Gaussian cross section (see 
Figure §b)). 

3. The third velocity field has a random velocity field (see Figure [SJ^c)). 
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Figure 3: Test velocity fields. 
For each velocity field, we test with two external forces f{x). 

1. The first external force f{x) is a narrow Gaussian point source located at (xi,X2) = 
(0.5,0.125). The response of this forcing term generates circular waves propagating 
at all directions. Due to the variations of the velocity field, the circular waves are 
going to bend, form caustics, and intersect. 

2. The second external force f{x) is a Gaussian wave packet whose wavelength is com- 
parable to the typical wavelength of the domain. This packet centers at (xi,X2) = 
(0.125,0.125) and points to the (1,1) direction. The response of this forcing term 
generates a Gaussian beam initially pointing towards the (1,1) direction. Due to the 
variations of the velocity field, this Gaussian beam bends and scatters. 

Firstly, we study how the sweeping preconditioner behaves when uo varies. For each 
velocity field, we perform tests for ^ = 16, 32, ... , 256. In these tests, we discretize each 
wavelength with g = 8 points and n — 127, 255, . . . , 2047. The a value of the modified 
system is set to be equal to 2. The width of the moving PML is equal to 12h (i.e. h = 12) 
and the number d of layers processed within each iteration of Algorithms |2.3| and |2.4| is 
equal to 12. The sweeping pattern indicated in Figure [2] (left) is used in these tests. 

The results of the first velocity field are summarized in Table fl] Tgetup denotes the time 
used to construct the preconditioner in seconds. For each external force, A^iter is the number 
of iterations of the preconditioned GMRES iteration and Tgoive is the overall solution time. 
When uj and n double, N increases by a factor of 4 and the setup cost in Table [l] increases 
roughly by a factor of 4 as well, which is consistent with the 0{N) complexity of Algorithm 
|2.3[ At the same time, the number of iterations is essentially independent of n. As a result, 
the overall solution time increases by a factor of 4 or 5 when N quadruples, exhibiting the 
almost linear complexity of Algorithm |2.4[ 

The results of the second and third velocity fields are summarized in Tables [2] and 
[3j respectively. The quantitative behavior of these tests is similar to the one of the first 
velocity field. In all cases, the GMRES iteration converges in about 20 iterations with the 
sweeping preconditioner. 
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Table 1: Results of velocity field 1 with varying cj. Top: Solutions for two external forces 
with (jj/{27t) = 64. Bottom: Results for different u. 
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Table 2: Results of velocity field 2 with varying u. Top: Solutions for two external forces 
with uj /{2ti) — 64. Bottom: Results for different uj. 
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Table 3: Results of velocity field 3 with varying cj. Top: Solutions for two external forces 
with (jj/{27t) = 64. Bottom: Results for different uj. 

Secondly, we study how the sweeping preconditioner behaves when q (the number of 
discretization points per wavelength) varies. We fix ^ to be 32 and let q be 8, 16, 32, 64. 
The test results for the three velocity fields are summarized in Tables [4| [5} and [6j These 
results show that the number of iterations remains to scale at most logarithmically and the 
running time of the solution algorithm scales roughly linearly with respect to the number 
of unknowns. 





Test 1 


Test 2 


cu/(27r) 


Q 


N = n^ 


-^ setup 


iViter 


-^ solve 


iViter 


-^ solve 


32 


8 


2552 


9.19e-01 


15 


1.65e+00 


15 


1.61e+00 


32 


16 


5112 


3.91e+00 


14 


6.94e+00 


15 


7.22e+00 


32 


32 


10232 


1.59e+01 


17 


8.87e+01 


17 


9.39e+01 


32 


64 


20472 


6.68e+01 


19 


3.74e+02 


20 


4.15e+02 



Table 4: Results of velocity field 1 with varying q. 



Let us compare these numerical results with the ones from the previous paper [T2] . 
The algorithms proposed in this paper are implemented in Matlab, while the ones in [12] 
are implemented in C++ with compiler optimization. Hence, direct comparison of the 
running time is clearly in favor of the algorithms in the previous paper. We would expect 
the running time of the algorithms in this paper to improve by a factor of 2 to 3 when 
implemented in optimized C++ code. Even with this implementational disadvantage, the 
setup time Tgetup of the current approach is about twenty times faster. This is mainly due 
to the fact that the implementation of the LU factorization is much more efficient compared 
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Table 5: Results of velocity field 2 with varying q. 
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Table 6: Results of velocity field 3 with varying q. 

to our implementation of the hierarchical matrix algebra in [12j. On the other hand, the 
number of iterations A^iter and solution time Tgoive of the current algorithms are higher. 
This is because in [12] T^ are represented by numerical low-rank approximations of the full 
half-space Green's function while in the current approach T^ are approximated based on a 
modified Helmholtz equation in a truncated domain. 

3.2 Scattering problem 

The sweeping preconditioner proposed in this paper can also be extended to scattering 
problems. Let us consider a simple case where the scatterer is a sound soft disk centered 
at the origin with radius rg. In polar coordinates, the scattered field satisfies the following 
equations 



1. 

- [rur 
r 



w 






where Ui^c is the incident field and the Sommerfeld boundary condition is specified for 
r goes to infinity. One way to solve this scattering problem is to truncate the domain 
at r = ri for some ri > rg and apply the PML condition at r = ri. We can then 
apply the sweeping preconditioner in the radial direction from r = ri to r = rg. In the 
following example, c(r^9) = 1, rg = 0.15 and ri = 0.5. The polar grid is determined so 
that the each wavelength is discretized with g = 8 points. For each fixed cj, two incident 
fields are used: one is the Green's function centered at (—0.2, 0.2) and the other is the 
plane wave exp(— zc<JX2) traveling towards the negative X2 direction. We perform tests for 
^ = 16, 32, 64, 128, 256 and the numerical results are reported in Table [Tj 

4 Preconditioner in 3D 

The presentation of the 3D preconditioner follows the layout of the 2D case. 
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Table 7: Results of the scattering problem. Top: Scattered fields for two incident waves 
with uj/{2ti) = 64. Bottom: Results for different uj. 



4.1 Discretization and sweeping factorization 

The computational domain is D — (0, 1)^. Similar to the 2D case, assume that the Dirichlet 
boundary condition is used on the side X3 = 1 and the Sommerfeld boundary condition is 
enforced on other sides. Define 



CTl{t) ^ CJ2{t) 



( t-v 



(^ 



l±v\ 



te[ri,l-r]], a3it) 
tG[l-r?,l] 



c 
n 





( t-v 



te[0,7]] 

t G [r?, 1] 



and 



5i(xi) = 1 + i 



.cr(xi) 



UJ 



52(^2) = 1 + i 



.Cr(x2) 



CJ 



, ss{xs) = 1 + z 



CJ 



The PML approach replaces di, 82, and 82 with 5i(xi)5i, S2{x2)82, and 53(x3)53, respec- 
tively. This effectively provides a damping layer of width 77 near the sides with Sommerfeld 
condition. The resulting equation takes the form 

,2 



{Si8i){si8i) + {S282){S282) + {S383){S383) + 



UJ 



c^{x) 



f ^^(o,l)^ 

xe8{[0,lf) 



We assume that f{x) is supported inside [77, 1 — 77] x [77, 1 — 77] x [77, 1] (away from the PML). 
Dividing the above equation by Si(xi)s2(x2)ss(xs) results 

,2 



S2S3 



S1S3 



UJ 



dl{^^di]+d2{^^d2]+ds{^^ds]+ _ _ "_,,_, ]u^f. 



S1S2 



SlS2S3C^{x) 
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The domain [0, 1]^ is discretized with a Cartesian grid with spacing h = l/(^ + 1), where 
the number of points n in each dimension is proportional to cj. The interior points of this 
grid are 

T^ = {PiJ,k = {ihjh,kh) :1 <ij,k <n} 

(see Figured (left)) and the total number of grid points is A^ = n^. 




,0,0) 




(0,0,0) 



Figure 4: Left: Discretization grid in 3D. Right: Sweeping order in 3D. The remaining grid 
shows the unknowns yet to be processed. 

We denote by Uij^ki fij,ki ^ind Cij^k the values of u{x)^ f{x), and c{x) at point Xij^k — 
{ih^jh^ kh). The standard 7-point stencil finite difference method writes down the equation 
at points in V using central difference. The resulting equation at {ih^jh, kh) is 

1 / 51 \ ^ ^ f '^\ ^ ^ f '^\ 



h^ Ksissj,^^^!^ h^ \siS2j,^^^^_i h^ \siS2 



^ij,k+l 



+ 



(jj 



^{siS2Ss)ij^k • C.j^^ 



hJJ+2 

(• • • ) Uij^k = fij.k (7) 



with Uii^ji^k' equal to zero for (i^/, k') that violates 1 < ^^y, k^ < n. Here (• • • ) stands for 
the sum of the six coefficients appeared in the first two lines. We order Uij^k ^tnd fij^k by 
going through the three dimensions in order and denote the vectors containing by 

U = ('^1,1,1, '^2,1,1, • • • , '^71,1,1, • • • , Ui^n,ni '^2,n,n, • • • , Un^n.n) 
J ^ v/l,l,l5 /2,l,l5 • • • 5 /n,l,l5 • • • 5 /l, 71,715 /2,7i,7i5 ' ' ' 5 Jn,n,n) 

The discrete system of Q takes the form Au = /. We further introduce a block version. 
Define Vm to be the indices in the 7n-th row 



' m — \Pl, 1, mi P2, 1,1711 • • • iPn,n,mj 



and introduce 



"^m — v'^1, 1,777-5 '^2,1,7775 ' ' ' 5 '^77,77,777J aUQ Jfyi — (,/l, 1,7775 /2, 1,7775 ' ' ' 5 Jn,n,', 
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Then 



Using these notations, the system Au — f takes the following block tridiagonal form 

Ml,l ^1,2 
^2,1 ^2,2 



A. 



\ 



A. 



n,n—l 



A 





U2 




/2 




\Un) 




V/n/ 



where each block is of size v? x v? and the off-diagonal blocks are diagonal matrices. The 
sweeping factorization takes the same form as the 2D one ([s]). In order to design an efficient 
preconditioner, the main task task is to construct approximations for the Schur complement 
matrix T^ '. Qm ^ "^mi which maps an external force Qm loaded only on the 7Ti-th layer to 
the solution i;^ restricted to the same layer. Following the central idea of pushing the PML 
right next to X3 = mh^ we define 



^^(^3) 



l + i 



.0-3 (x3 - (m- b)h) 



UJ 



- [0, 1] 


X [0, 1] X 


[{m — h)h, [m 


+ l)h]: 


c^2 ^ 


u = f 


X fc J^rm 


(8) 




« = 


X e dDm- 





and introduce an auxiliary problem on the domain Dr^ 



{Sldi){sidi) + {S2d2){s2d2) + {s^d3)is^d3) + 



This equation is then discretized with the subgrid 

Gm = {Pij.k, l<ij<n,m-b+l<k<m} 

of the original grid V. The resulting bn? x bn? discrete Helmholtz operator is denoted by 
Hm- The operator T^ '■ Qm ^ "^m defined by 



* 

\VmJ 



H^ 



/0\ 


\9mJ 



is an approximation of the Schur complement matrix T^. Since Hm comes from the 7- 
point stencil with b layers, this can be viewed as a quasi-2D problem, which can be solved 
efficiently using a modified version of the multifrontal method O [TH [22]. 

The main idea of the multifrontal method is simple yet elegant. Take a n x n 2D grid 
as an example and use M to denote the discrete operator resulted from a local stencil. The 
multifrontal method reorders the unknowns hierarchically in order to minimize the fill-ins 
of the LDL^ factorization of M. For the n x n Cartesian grid, one possible ordering is 
given in Figure [5] where the unknowns are clustered into groups and the groups are ordered 
hierarchically. The construction of the LDL^ factorization eliminates the unknowns group 
by group. The dominating cost of the algorithm is spent in inverting the unknowns of the 
last few groups and the overall cost is O(n^), cubic in terms of the size of the last group. 
Moreover, the L matrix is never constructed explicitly in the multifrontal method. Instead 
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it is stored and applied as a sequence of (block) row operations for the sake of efficiency. 
Applying M~^ to an arbitrary vector using the result of the multifrontal algorithm takes 
O(n^logn) steps. In the current setting, we adopt the same hierarchical partitioning in 
the (xi,X2) plane, while keeping the unknowns with the same xi and X2 indices in the 
same group. Since now the size of the last group is of order 0(6n), the construction phase 
of the multifrontal method takes 0{b^n^) steps and applying to an arbitrary vector takes 
0(6^n^logn) steps. 
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Figure 5: Multifrontal algorithm on a 15 x 15 two dimensional Cartesian grid. Left: The 
unknowns are clustered into groups hierarchically to minimizes the boundary between dif- 
ferent groups. Right: Elimination order of different groups. The groups are eliminated in 
the increasing order of their indices. 



4.2 Approximate inversion and preconditioner 



Let us now combine the multifrontal method into Algorithms 2.1 and 2.2 to build the 
approximate inverse of H. Similar to the 2D case, we define 



UF^{U\,...,UIY /f = (/{,..., AT- 



and write 



V 



A, 



A 



n,n—l 



A 

-^J- 77,-77, 



\ Un J 



fb+1 



\fnj 



The goal of the construction of the approximate sweeping factorization of A is to compute 
Tjn and the algorithm consists of the following steps. 

Algorithm 4.1. Construction of the approximate sweeping factorization of A with moving 
PML. 

1: Let Qf be the subgrid of the first b layers and Hp = ^f,f- Construct the multifrontal 

factorization of Hp by partitioning Qp hierarchically in the (xi,X2) plane. 
2: for 7n = 6+l,...,ndo 
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3: Let Qm — {Pi,j,k^ l<z,j<n,77i — 6+l<A:< m} and Hj^ be the system of ([s]) on 
Gm- Construct the multifrontal factorization of H^ by partitioning Gm hierarchicahy 
in the (xi,X2) plane. 

4: end for 



The cost of Algorithm 4.1 is 0(6^n^) = 0{b^N^/^). The computation of u from this 



sweeping factorization is summarized in the following algorithm 

Algorithm 4.2. Computation of u ^ ^~^f using the sweeping factorization of A with 
moving PML. 

1: uf = fr and Uj^ = /^ for tti = 6 + 1, . . . , n. 

2: ui,^i = i/5+i — Ai,^i^jp{TfUf)- TpUp is computed using the multifrontal factorization 

3: for 7Ti = 6+l,...,n— Ido 

4: Um+i — Um+1 — ^m+i,m(2Tn'^m)- The application of TjnUm is donc by forming the 
vector (0, . . . , 0, u\^y ^ applying H^ to it using the multifrontal factorization of i7^, 
and extracting the value on the last layer. 
end for 

Up — Tpup. See the previous steps for the application of Tp. 
for m = 6 + 1, . . . , n do ^ 

Ujn — TmUm- See the previous steps for the application of T^. 
end for 
for 7Ti = n — 1,...,6+1 do 

Um — Ujn — Tjn(Ajn^jn^iUjn+i) • See the previous steps for the application of T^. 
end for 

Uf = Uf — Tf{Af^i,^iUi,^i). See the previous steps for the application of Tp. 
The cost of Algorithm [42I is 0{b'^n^logn) = 0{b'^NlogN). 
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For the stability reason mentioned in Section [2} we apply Algorithms 4J_ and 4^ to the 
discrete operator Aa of the modified system 

Au{x) + ^"'^/"^^ (x) = fix), 

C [X) 

where a is an 0(1) positive constant. We denote by M^ : f ^ u the operator defined 
by Algorithm |2.4| for this modified equation. Since A^ is close to A when a is small, we 
propose to solve the preconditioner system 

MaAu = Maf 

using the GMRES solver [25[ 126]. Because the cost of applying Ma to any vector is 
0(A^log7V), the total cost of the GMRES solver is 0{NjN log N), where Nj is the number 
of iterations required. As the numerical results in Section [5] demonstrate, Nj is essentially 
independent of the number of unknowns N, thus resulting an algorithm with almost linear 
complexity. 

5 Numerical Results in 3D 

In this section, we present several numerical results to illustrate the properties of the algo- 
rithm described in Section [H We use GMRES as the iterative solver with relative residue 
tolerance equal to 10~^. 
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The examples in this seciton have the PML boundary condition specified at all sides. 
We consider three velocity fields in the domain i? = (0, 1)^: 

1. The first velocity field is a converging lens with a Gaussian profile at the center of 
the domain (see Figure [6]^a)). 

2. The second velocity field is a vertical waveguide with Gaussian cross section (see 
Figure JGJ^b)). 

3. The third velocity field is a random velocity field (see Figure [6]^c)). 
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Figure 6: Test velocity fields. 
For each problem, we test with two external forces f{x). 

1. The first external force f(x) is a Gaussian point source located at {xi^X2^xs) = 
(0.5, 0.5, 0.25). The response of this forcing term generates circular waves propagating 
at all directions. Due to the variations of the velocity field, the circular waves are 
going to bend and form caustics. 

2. The second external force f(x) is a Gaussian wave packet whose wavelength is com- 
parable to the typical wavelength of the domain. This packet centers at (xi, X2, xs) = 
(0.5, 0.25, 0.25) and points to the (0, 1, 1) direction. The response of this forcing term 
generates a Gaussian beam initially pointing towards the (0, 1, 1) direction. 

Firstly, we study how the sweeping preconditioner behaves when uj varies. For each ve- 
locity field, we perform tests for ^ equal to 5, 10, 20. In these tests, we discretize each wave- 
length with q — i points and the number of samples in each dimension is n — 39, 79, 159. 
The a value of the modified system is set to be equal to 1. The width of the PML is equal 
to Qh (i.e., 6 = 6) and the number of layers processed within each iteration of Algorithms 



4.1 and |4.2| is equal to 3 (i.e., d = 3). The preconditioner sweeps the domain with two 
fronts that start from X3 = and X3 = 1. 

The results of the first velocity field is reported in Table [8) The two plots show the solu- 
tions of the two right sides on a plane near xi — 0.5. Tgetup is the time used to construct the 
preconditioner in seconds. A^iter is the number of iterations of the preconditioned GMRES 
iteration and Tgoive is the solution time. The estimate in Section |4] section shows that the 
setup time scales hke 0{N^/'^). So when uj doubles, N increases by a factor of 4 and Tgetup 
should increase by a factor of 16. The numerical results show that the actual growth factor 
is even lower. A remarkable feature of the sweeping preconditioner is that in all cases the 
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preconditioned GMRES solver converges in at most 12 iterations. Finally, we would like 
to point out that our algorithm is quite efficient: for the case with uo/{2ti) — 20 with more 
than four million unknowns, the solution time is less than 600 seconds. The results of the 



second and the third velocity fields are reported in Tables [9] and [T0| respectively. In all 
tests, the GMRES iteration converges at most 13 iterations when combined with the new 
sweeping preconditioner. 
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Table 8: Results of velocity field 1 with varying uj. Top: Solutions for two external forces 
with uj/{2ti) = 16 on a plane near xi — 0.5. Bottom: Results for different uj. 

Secondly, we study how the sweeping preconditioner behaves when q (the number of 
discretization points per wavelength) varies. We fix ^ to be 5 and let q be 8, 16, 32. 



The test results for the three velocity fields are summarized in Tables [TT] [12} and 13, 
respectively. These results show that the number of iterations remains roughly constant 
and the running time of the solution algorithm scales roughly linearly with respect to the 
number of unknowns. 

Let us compare these numerical results with the ones from the 3D results from the 
previous paper [12j. The setup time Tgetup of the current algorithms is much lower: for 
the problem of 20 wavelength across, the current setup time is in the hundreds of seconds 
while the setup time in [12] is in the tens of thousands of seconds. This is mainly due 
to the fact that our implementation of the multifrontal algorithm in this paper is more 
efficient compared to our implementation of the 2D hierarchical matrix algebra in [12j. 
The number of iterations A^iter is about 5 times larger, again due to the fact that the 
current algorithms use physical arguments about the Helmholtz equation rather than direct 
numerical approximation for T^. Notice that the solution time Tgoive is only about 3 
to 4 times larger and this is due to the efficiency of applying T^ using the multifrontal 
factorization. 
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-^ setup 


-^^iter -^ solve 


^^iter -^ solve 


5 

10 
20 


8 
8 
8 


393 
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1593 


4.83e+00 
6.76e+01 

8.24e+02 


12 5.14e+00 

13 5.70e+01 

14 6.32e+02 


12 5.03e+00 
12 5.64e+01 
11 5.40e+02 



Table 9: Results of velocity field 2 with varying u. Top: Solutions for two external forces 
with (jj/(27r) = 16 on a plane near xi = 0.5. Bottom: Results for different u. 
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c^/(27r) 
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N^n^ 


-^ setup 


-^^iter -^ solve 


-^^iter -^ solve 


5 

10 
20 


8 
8 
8 


393 
793 
1593 


4.85e+00 
6.69e+01 
8.42e+02 


12 5.26e+00 
11 5.10e+01 
11 5.58e+02 


12 5.44e+00 

13 5.99e+01 
13 6.28e+02 



Table 10: Results of velocity field 3 with varying u. Top: Solutions for two external forces 
with (jj/{27r) = 16 on a plane near xi — 0.5. Bottom: Results for different uj. 

6 Conclusion and Future Work 

In this paper, we proposed a new sweeping preconditioner for the Helmholtz equation in two 
and three dimensions. Similar to the previous paper [12], the preconditioner is based on an 
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-^^iter -^ solve 


-^^iter -L solve 


5 

5 
5 


8 39^ 4.87e+00 
16 793 6.59e+01 
32 1593 8.07e+02 


11 4.91e+00 
11 4.70e+01 
13 5.91e+02 


11 4.96e+00 

12 5.55e+01 

13 6.31e+02 


Table 11: Results of velocity field 1 with varying q. 
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q N^n^ Tsetup 


-^^iter -L solve 


-^^iter -L solve 


5 
5 
5 


8 393 4.80e+00 
16 793 6.74e+01 
32 159^ 8.18e+02 


12 5.36e+00 

13 5.53e+01 

14 6.48e+02 


12 4.95e+00 
12 5.51e+01 
14 6.45e+02 


Table 12: Results of velocity field 2 with varying q. 
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Test 2 


a;/(27r) 


q N = n^ Tsetup 


-^^iter -^ solve 


-^^iter -^ solve 


5 
5 
5 


8 393 4.82e+00 
16 793 6.77e+01 
32 1593 8.16e+02 


12 4.92e+00 

12 5.17e+01 

13 6.26e+02 


12 5.08e+00 

13 6.04e+01 
15 7.14e+02 



Table 13: Results of velocity field 3 with varying p. 

approximate block LDL^ factorization that eliminates the unknowns layer by layer starting 
from an absorbing layer or boundary condition. What is new is that the Schur complement 
matrices of the block LDL^ factorization are approximated by introducing moving PMLs 
in the interior of the domain. In the 2D case, applying these Schur complement matrices 
corresponds to solving quasi- ID problems by an LU factorization with optimal ordering. 
In the 3D case, applying these Schur complement matrices corresponds to solving quasi-2D 
problems with multifrontal methods. The resulting preconditioner has a linear application 
cost and the number of iterations is essentially independent of the number of unknowns or 
the frequency when combined with the GMRES solver. 

Some questions remain open. First, we tested the algorithms with the PML bound- 
ary condition as the numerical implementation of the Sommerfeld condition. Many other 
boundary conditions are available and we believe that the current algorithms should work 
for these boundary conditions. We presented the algorithms using the simplest central 
difference scheme (5 point stencil in 2D and 7 point stencil in 3D). The dispersion rela- 
tionships of these schemes are rather poor approximations to the true one. One would like 
to investigate other more accurate stencils and other types of discretizations such as finite 
element, spectral element, and discontinuous Galerkin. 

Parallel processing is necessary for large scale 3D problems. Although the overall struc- 
ture of the sweeping preconditioner is sequential by itself, the calculation of the multifrontal 
method within each iteration can be well parallelized. Several efficient implementations are 
already available [1^ l2QJ for this purpose. There is also an alternative to parallelize via a 
coarse scale domain decomposition and apply our technique within each subdomain. 
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The approach of the current paper is readily apphcable to non-uniform and even adap- 
tive grids. In fact, the same non- uniform or adaptive grid can be used for the subproblems 
associated with the moving PMLs, as long as the grid can resolve the moving PML with 
sufficient accuracy. Since the multifrontal methods for non-uniform and adaptive grids are 
readily available [ll |2T] , it makes the current approach more flexible compared with the one 
of the previous paper [T2] based on the hierarchical matrix representation. 
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